-
Notifications
You must be signed in to change notification settings - Fork 4.3k
feat(sagemaker): add support for serverless inference endpoints #35557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This review is outdated)
aad0c97 to
78ef21c
Compare
|
taking a look. |
|
❌ Features must contain a change to a README file. As this is a new feat we need
|
✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.
04fc444 to
5ff7875
Compare
2ab372b to
d8a868d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pymia,
Thanks for your contribution. I have left some comments :)
|
|
||
| // Validate mutual exclusivity | ||
| if (props.instanceProductionVariants && props.serverlessProductionVariant) { | ||
| throw new Error('Cannot specify both instanceProductionVariants and serverlessProductionVariant. Choose one variant type.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to find any documentation that says instanceProductVariant and serverlessProductVariant cannot be used simultaneously for a single endpoint. Could you please provide the source that refers to this restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instance based deployment and serverless deployment should not exist at the same time.
Reference: Amazon SageMaker Deploy Model, and AWS::SageMaker::EndpointConfig ProductionVariant.
readme_serverless_section.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CDK has one README per service. SageMaker-alpha already has a README, so we don't need to create a new one.
We can add the documentation related to this PR in packages/@aws-cdk/aws-sagemaker-alpha/README.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved with the latest commit.
enhanced_integ_test.ts
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep the integration test in one file. We have another integration test file integ.endpoint-config. We can put all the necessary integration-test related code in that file and remove this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved with the latest commit.
- Remove standalone readme_serverless_section.md file - Remove enhanced_integ_test.ts file - Consolidate serverless tests into existing integ.endpoint-config.ts - Add comprehensive serverless test cases (minimal, full, boundary values) - Maintain existing documentation in main SageMaker README - Keep mutual exclusivity validation with AWS docs justification Addresses review comments in PR aws#35557
Pull request has been modified.
Implements SageMaker Serverless Inference endpoints as requested in issue aws#23148. - Add ServerlessProductionVariantProps interface with maxConcurrency, memorySizeInMB, and provisionedConcurrency - Extend EndpointConfig to support serverless variants alongside existing instance variants - Add comprehensive validation for serverless configuration parameters - Enforce mutual exclusivity between instance and serverless variants - Add CloudFormation template generation for ServerlessConfig properties - Include extensive test coverage for validation scenarios and error cases Closes aws#23148
…less inference - Add comprehensive serverless inference documentation to SageMaker alpha README - Update integration test with serverless endpoint configuration examples - Include verification comments for both instance-based and serverless endpoints - Generate CloudFormation snapshots with proper ServerlessConfig properties Addresses reviewer feedback requiring README documentation and integration test coverage for the new serverless inference feature.
…ch AWS specs - Update maxConcurrency validation range from 1-200 to 1-1000 - Update provisionedConcurrency validation range from 1-200 to 1-1000 - Fix memory size documentation from 3008MB to 3072MB in requirements - Add comprehensive test coverage for upper bound validation - Update TypeScript definitions and JSDoc comments This aligns the implementation with AWS SageMaker serverless endpoint specifications and RFC 431 requirements for L2 constructs.
- Remove standalone readme_serverless_section.md file - Remove enhanced_integ_test.ts file - Consolidate serverless tests into existing integ.endpoint-config.ts - Add comprehensive serverless test cases (minimal, full, boundary values) - Maintain existing documentation in main SageMaker README - Keep mutual exclusivity validation with AWS docs justification Addresses review comments in PR aws#35557
4a225e0 to
83c635e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes earlier. I have left a few more comments.
Thanks. :)
| } | ||
|
|
||
| // validate instance variant limits | ||
| if (hasInstanceVariants && this._instanceProductionVariants.length > 10) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think hasInstanceVariants is redundant here, if (this._instanceProductionVariants.length > 10) should be enough
| ### Serverless Inference | ||
|
|
||
| Amazon SageMaker Serverless Inference is a purpose-built inference option that makes it easy for you to deploy and scale ML models. Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add the link to the doc for further reference:
SageMaker ServerLess Inference
| * Render the serverless production variant. | ||
| */ | ||
| private renderServerlessProductionVariant(): CfnEndpointConfig.ProductionVariantProperty[] { | ||
| if (!this.serverlessProductionVariant) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should throw an error in this case. The design is to only call renderServerlessProductionVariant when serverlessProductionVariant is defined. Therefore, if serverlessProductionVariant is not defined, it should be treated as an error.
| */ | ||
| private renderInstanceProductionVariants(): CfnEndpointConfig.ProductionVariantProperty[] { | ||
| this.validateProductionVariants(); | ||
| return this._instanceProductionVariants.map( v => ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a validation here. If the instanceProductionVariant is empty we can throw an error.
| }, | ||
| }); | ||
|
|
||
| new IntegTest(app, 'integtest-endpointconfig', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure the resources were deployed as intended, we can perform assertions using API calls.
We can do the assertion like this:
const integ = new IntegTest(app, 'integtest-endpointconfig', {
testCases: [stack],
});
// Verify instance-based endpoint config
integ.assertions.awsApiCall('SageMaker', 'describeEndpointConfig', {
EndpointConfigName: endpointConfig.endpointConfigName,
}).expect(ExpectedResult.objectLike({
ProductionVariants: [
{ VariantName: 'firstVariant', InstanceType: 'ml.m5.large' },
{ VariantName: 'secondVariant' },
{ VariantName: 'thirdVariant' },
],
}));
// Verify serverless endpoint config
integ.assertions.awsApiCall('SageMaker', 'describeEndpointConfig', {
EndpointConfigName: serverlessEndpointConfig.endpointConfigName,
}).expect(ExpectedResult.objectLike({
ProductionVariants: [{
VariantName: 'serverlessVariant',
ServerlessConfig: {
MaxConcurrency: 10,
MemorySizeInMB: 2048,
ProvisionedConcurrency: 5,
},
}],
}));
| * | ||
| * The above command will result in the following output. | ||
| * | ||
| * For instance-based endpoint config, the above command will result in the following output: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are adding the assertions, we can remove this commented part. We are verifying what is created in the stack by the API call anyway.
|
This PR has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled. |
|
This issue has been reopened and is now available for discussion. |
Implements SageMaker Serverless Inference endpoints as requested in issue #23148.
Issue # 23148
Closes #23148.
Reason for this change
AWS SageMaker Serverless Inference is not supported in the CDK SageMaker L2 constructs. Users can only configure instance-based endpoints, missing the serverless option for intermittent/unpredictable traffic patterns that could benefit from cost-effective serverless inference.
This feature was explicitly planned in the original SageMaker Endpoint L2 construct RFC with Instance-prefixed classes designed to make room for Serverless-prefixed analogs.
Description of changes
Implements AWS SageMaker Serverless Inference support in CDK SageMaker L2 constructs, enabling cost-effective serverless endpoints for intermittent workloads:
ServerlessProductionVariantPropsinterface extendingProductionVariantPropswith AWS-compliant serverless properties:maxConcurrency: 1-200 range (required)memorySizeInMB: 1024-6144MB in 1GB increments (required)provisionedConcurrency: 1-200 range, optional, must be ≤ maxConcurrencyaddServerlessProductionVariant()method with comprehensive input validationEndpointConfigPropswith optionalserverlessProductionVariantpropertyServerlessConfigsupportUsage Example:
Describe any new or updated permissions being added
N/A - No new IAM permissions required. Leverages existing SageMaker model and endpoint permissions.
Description of how you validated changes
Unit tests: Added 12 comprehensive serverless variant tests covering all validation scenarios:
Integration tests: Extended existing integration test with serverless endpoint configuration, verified CloudFormation template generation with correct
ServerlessConfigproperties:Comprehensive testing results: 63/63 unit tests pass (100% success rate), 4/4 integration tests pass, no regressions detected across 16,024+ CDK tests
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license